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Abstract 

Th**  low  cost  ami  availability  of  clusters  of  workstations  have  load  researchers  to  re-explore  dis¬ 
tributed  computing  using  in<lependent  workstations.  This  approach  may  provide  bettor  cost /performance 
than  tightly  coupled  multiprocessors.  In  practice,  this  approach  often  utilizes  wasted  cycles  to  run 
parallel  Jobs.  In  this  paper  we  address  the  feasibility  of  such  a  non-dedirated  parallel  pr(jcess- 
ing  enviroimient  assuming  workstation  processes  have  preemptive  priority  over  parallel  tasks.  We 
develop  an  analytical  model  to  predict  fiarallcl  job  response  times.  Our  model  provides  insight 
into  how  significantly  workstation  owner  interference  degrades  parallel  program  performance,  A 
new  term  task  ratio  which  relates  the  parallel  task  <lemand  to  the  mean  service  demand  of  non 
parallel  wr/rkstatioi;  processes,  is  intr-iduced.  We  propose  that  task  ratio  is  a  useful  metric  for 
determining  how  large  the  demand  of  a  parallel  applications  must  be  in  (.irder  to  make  efficient  ust' 
of  a  non-dedicated  distributed  svstem. 
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1  Introduction 

Most  early  parallel  processing  research  focused  on  using  distributed  systems  to  speedup  computa¬ 
tions.  The  basic  approach  was  to  utilize  many  computers  connected  via  a  local  area  network  (LAN) 
to  execute  a  parallel  job.  We  will  refer  to  this  environment  as  distributed  computing.  With  the  advent 
of  multiprocessor  architectures  the  majority  of  the  focus  shifted  from  distributed  computing  to  multi¬ 
processing,  the  major  distinction  being  the  tightly  coupled  architecture  allowing  more  finely  grained 
parallelism. 

Recently,  a  significant  portion  of  the  parallel  community  has  returned  to  the  distributed  processing 
approach.  Several  commercial  and  noncommercial  tools  have  been  developed  to  support  distributed 
computing.  One  widely  used  tool  is  the  Parallel  Virtual  Machine  (PVM)  project  [9,  5, 1, 2].  According 
to  the  authors.  PV'.M  is  now  being  used  at  more  than  100  sites.  A  major  driving  force  behind 
the  reevaluation  of  distributed  computing  is  the  high  cost  of  parallel  computers.  Using  a  group  of 
workstations  connected  via  a  LAN  may  provide  better  cost/performance,  or  may  be  the  only  way  to 
achieve  high  performance  within  budget  constraints  for  some  organizations.  Another  factor  in  favor 
of  distributed  computing  is  the  availability  of  many  lightly  loaded  workstations.  These  otherwise 
wasted  idle  cycles  can  be  used  by  a  distributed  computation  to  provided  speedups  and/or  to  solve 
large  problems  that  otherwise  could  not  be  tackled. 

It  is  clear  that  many  problems  are  amenable  to  the  distributed  computing  approach  [3].  However, 
for  some  applications,  the  inherent  synchronization  requirements,  communication/computation  ratio, 
and  the  granularity  of  parallelism  may  limit  the  obtained  performance.  Even  for  the  “good"  applica¬ 
tions.  a  tacit  assumption  of  the  expected  high  performance  is  that  a  system  of  dedicated  workstations 
are  used,  which  may  not  be  true  in  practice.  In  this  paper  we  study  the  performance  of  distributed 
computing  in  a  non-dedicated  system  assuming  workstation  owner  processes  have  preemptive  priority 
over  parallel  tasks 

We  assume  the  parallel  application  considered  belongs  to  the  class  of  programs  that  can  run 
efficiently  in  a  dedicated  distributed  computing  environment.  We  do  not  consider  the  effects  of 
synchronization,  communication,  or  granularity  of  parallelism.  Given  the  program  executes  efficiently 
in  a  dedicated  system,  we  wish  to  determine  whether  we  can  achieve  good  performance  in  a  non- 
dedicated  system. 

One  factor  that  must  be  considered  in  a  non-dedicated  system  is  how  intrusive  the  parallel  pro¬ 
grams  are  to  the  owners  of  the  workstations  and  vice  versa.  The  priority  of  the  parallel  tasks  relative 
to  the  priority  of  processes  initiated  by  the  owner  of  the  workstation  can  have  a  significant  impact  on 
the  perfortnance  of  both  the  parallel  job  and  the  owner's  serial  jobs.  We  assume  that  a  workstatioti 
owtier  is  not  tolerant  of  other  people  using  their  workstation,  and  hence  surmise  the  most  approjtriate 
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model  of  such  a  system  is  to  assume  workstation  owner  processes  have  preemptive  priority  over  pro¬ 
cesses  belonging  to  a  parallel  job.  Hence,  use  of  the  workstation  will  interfere  with  parallel  program 
performance.  The  major  goal  of  this  paper  is  to  provide  insight  into  how  significantly  workstation 
owner  interference  degrades  parallel  program  performance.  We  seek  to  answer  the  question,  “When 
is  distributed  computing  in  a  non-dedicated  environment  where  workstation  owner  processes  have 
preemptive  priority  over  parallel  tasks  a  viable  approach?” 

An  analytical  model  is  developed  to  predict  the  performance  under  the  non-dedicated  assumption. 
The  new  term  task  ratio  is  introduced  along  with  new  metrics  that  incorporate  the  utilization  of 
workstations  by  owner  processes.  We  find  that  the  task  ratio  plays  an  important  role  in  the  overall 
performance,  possibly  as  important  as  the  communication/computation  ratio  in  a  dedicated  system. 
The  analytical  model  provides  the  relationships  between  the  identified  parameters  and  shows  how 
these  parameters  influence  the  overall  response  time. 

In  addition  to  our  analysis,  a  hypothetical  local  computation  [11]  problem  is  implemented  with 
PVM  on  systems  with  1  to  12  homogeneous  workstations.  These  initial  experimental  results  confirm 
the  qualitative  results  from  the  analytical  model. 

This  paper  is  organized  as  follows.  In  Section  2  we  present  the  analytical  model  and  introduce 
new  parameters  and  metrics  for  non-dedicated  distributed  computing.  The  results  from  our  analysis 
are  presented  in  Section  3.  Experimental  results  with  PVM  on  12  homogeneous  workstations  are 
presented  in  Section  4,  and  our  conclusions  are  in  Section  5. 

2  Model  Description,  Analysis  and  Simulation 

In  this  section  we  describe  our  system  model,  our  analysis  technique,  and  simulation  model.  We 
make  simplifying  assumptions  that  favor  the  distributed  computing  approach.  In  particular,  we 
assume  a  parallel  job  is  composed  of  W  tasks  (one  per  workstation),  and  the  computation  is  perfectly 
balanced  among  these  tasks.  In  addition,  the  parallel  job  is  composed  of  one  single  parallel  phase 
with  no  communication  or  synchronization  requirements  other  than  the  final  synchronization  which 
occurs  when  all  of  the  tasks  have  completed.  Hence,  we  are  assuming  perfect  parallelism  of  the 
problem.  This  model  is  simplistic,  but  provides  the  best  ra.se  scenario  for  a  distributed  computing 
environttient.  In  addition,  by  not  incorporating  communication  or  synchronization  re(|uiremeiits  into 
the  model  we  are  able  to  attribute  all  degradation  of  parallel  program  performance  to  workstation 
process  interference.  Since  our  assumptions  are  always  optimistic,  the  model  |)redictions  provide  an 
upper  bound  on  expected  performance. 

We  assume  there  are  W  homogeneous  workstations  in  the  system  and  that  there  is  one  owner  ))er 
workstation.  Workstation  owners  are  in  a  continuous  cycle  tif  thinking  (idle  time)  and  then  usi-  time. 
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Table  1:  Notational  Definitions 


J 

Total  demand  of  the  parallel  job. 

w 

.Number  of  workstations  in  the  system. 

wm 

Demand  of  one  parallel  task  =  J  !  W. 

o 

Time  a  owner  process  uses  the  workstation. 

14 

Utilization  of  a  workstation  by  owner. 

P 

Probability  of  the  owner  requesting  the 
processor  during  a  given  time  step. 

.Mean  expected  task  completion  time. 

Mean  expected  job  completion  time. 

* 
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We  assume  there  is  one  parallel  job  being  executed  on  the  system  at  a  time. 

In  table  1  we  define  our  notation  used  through  out  the  paper.  The  demand  of  a  job  is  the  total 
computing  cycles  (time)  needed  for  the  job. 

2.1  Model  Description 

Our  model  is  a  discrete  time  model.  We  assume  a  geometric  distribution  with  mean  p  for  the  owner 
think  time.  i.  e.  at  each  time  unit  the  owner  requests  the  processor  with  probability  P.  When  an  owner 
process  starts  execution  an  executing  parallel  task  is  suspended  and  the  owner  process  is  immediately 
started.  The  owner  process  executes  for  O  units.  Once  the  owner  processes  completes  execution,  the 
parallel  task  restarts  execution  and  is  guaranteed  to  complete  at  least  one  unit  of  work  before  the 
owner  may  issue  another  process  requesting  the  processor. 

The  model  guarantees  the  parallel  task  will  complete  in  at  most  T  +  (T  xO)  units.  Task  execution 
time  at  a  single  workstation  is  thus  the  sum  of  task  demand  plus  the  time  to  complete  any  owner 
proces.ses  that  occur  during  the  tasks  tenure  in  the  system,  i.  e. 

task  time  =  T  +  (n  x  O),  (1) 

where  n  equals  the  number  of  owner  process  requests.  The  owner  process  can  make  a  request  after 
each  unit  of  time  the  parallel  task  uses  the  processor,  hence  the  number  of  owner  requests  is  binomiaUy 
distributed; 


Bin{T,n,P)=  ^  ^  j  P"  (1  - 


(2)  • 
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Thus,  expected  task  execution  time  is  equal  to 

T 

Et  =  T  +  '^OiBin{T,i,P).  (3) 

1=0 

The  job  execution  time  is  the  time  until  the  last  of  the  parallel  tasks  completes  execution.  Thus, 
job  completion  time  is  at  least  T  units  and  at  most  T  +  (T  x  O)  units.  We  first  derive  the  probability 
that  job  execution  time  equals  i  and  then  from  these  probabilities  get  the  expectation. 

Let  S[n]  equal  the  probability  that  an  individual  task  is  interrupted  by  at  most  n  owner  processes. 


5[n]  =  ^Bm(T,i.P).  (4) 

i=0 

Let  C[W,n]  equal  the  probability  that  all  parallel  tasks  are  interrupted  by  at  most  n  owner 
processes.  By  independence, 


C[W,n]  =  (S(«])’^. 


(5) 


Let  Max[W,n]  equal  the  probability  that  the  maximum  number  of  owner  process  interferences 
over  all  the  parallel  tasks  is  equal  to  n. 


Max[W,n]  =  C[W,n\  -  C[W,n-  1). 
Using  these  functions,  expected  job  execution  time  is  calculated  as; 


(6) 


T 

Ej=T-irY,Oi-Max\W,n].  (7) 

t=0 

Owner  utilization  (14)  can  be  calculated  as: 


O 

C7  +  1//' 


For  the  purposes  of  analysis  we  were  forced  to  make  some  simplifying  assumptions.  Our  model 
makes  assumptions  that  favor  the  distributed  computing  approach,  hence  the  model  |)rovides  a  lower 
hound  on  expected  response  time.  In  particidar.  the  model  is  r>ptimistir  with  regards  to  the  three 
following  points; 
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•  We  assume  parallel  task  times  are  deterministic.  Although  this  is  one  of  the  goals  of  parallel 
algorithm  design,  in  practice  there  is  often  some  imbalance  of  load. 

•  Variance  of  owner  process  service  demands.  We  have  assumed  a  deterministic  owner  process 
service  demand  when  in  fact  typical  processes  experience  a  much  larger  variance  [7].  Assuming 
a  distribution  with  more  variance  could  cause  some  parallel  tasks  to  be  delayed  much  longer 
than  T  +  (T  X  O). 

•  (’luaranteeing  the  parallel  task  at  least  one  unit  of  execution  between  requests.  In  a  real  system 
owner  processes  may  be  reissued  in  less  time,  thus  parallel  tasks  could  be  delayed  longer  than 
(T  X  O). 

These  assumptions  together  clearly  show  that  our  results  are  optimistic,  and  hence  actual  perfor¬ 
mance  could  be  worse  than  predicted  by  our  observations. 

2.2  Simulation  Description 

We  have  simulated  the  system  using  the  CSI.M  simulation  language  [Hj.  The  purpose  of  the  simulation 
is  solely  to  validate  the  coding  of  our  analysis.  We  intend  to  use  our  simulation  in  future  w(jrk  to 
explore  other  service  demand  distributions. 

.4U  results  have  confidence  intervals  of  1  percent  or  less  at  a  90  percetit  confidence  level.  Confidence 
intervals  are  calculated  using  batch  means  [1]  with  20  batches  per  simulation  run  and  a  batch  size 
of  1000  samples.  We  duplicated  the  experiment  found  in  figure  1  of  this  paper  and  the  simulation 
results  were  identical  to  the  analysis  thus  verifying  the  correctness  of  analysis  code.  We  did  not  plot 
the  results  since  they  are  indistinguishable  from  the  analysis. 

3  Analysis  Results 

In  this  .section  we  present  the  results  from  our  analysis.  .Ml  results  in  this  section  assume  an  owner 
process  has  preemptive  priority  over  a  parallel  task.  We  first  present  results  for  a  fixed  size  problem, 
and  then  discuss  the  impact  of  scaling  problem  size  with  the  number  of  workstations. 

3.1  Fixed-Size  Speedup 

We  first  address  the  benefit  of  the  distributed  computing  approach  for  a  fixed  size  job.  In  this 
case,  ttie  desired  goal  of  parallelizing  the  program  is  to  achieve  faster  execution  times,  hence  we  use 
expected  speedup  as  our  primary  metric.  Since  the  standaril  definition  of  speedup  does  not  take  into 
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ronsideration  the  rycles  consumed  by  the  (higher  priority)  owner  processes,  we  also  define  tlie  metric 
weighted-speedup.  We  also  consider  the  metrics  efficiency  and  weighted-efficiency  to  illustrate  more 
concretely  the  achieved  percent  of  optimal  performance.  Specifically,  once  again  let  J  equal  the  total 
job  demand,  W  equal  the  number  of  workstations,  Ej  equal  the  expected  job  completion  time,  and 
14  equal  the  owner  process  utilization  of  the  workstations.  Then; 


Task  Ratio 
Speedup 

Weighted-Speedup 

Efficiency 

Weighted-  Efficiency 


T 

o 


E, 


(i-U)E, 


J/W 

E, 


J/W 

E, 


The  expected  speedup  and  efficiency  metrics  are  of  interest  if  a  u.ser  wishes  to  determine  the 
benefit  of  parallelizing  the  job  relative  to  running  the  program  on  a  single  dedicated  machine.  The 
weighted  metrics  incorporate  utilization  to  clearly  demonstrate  how  effectively  the  parallel  program 
is  able  to  use  the  idle  system  cycles.  We  focus  primarily  on  the  weighted  metrics  since  they  i)rovi(le 
a  better  metric  for  detertnining  how  well  the  distributed  computing  approach  can  utilize  idle  cycles. 

In  figure  1  we  plot  speedup  versus  the  number  of  workstations  for  workstations  utilizations  of 
\7< .  I'T  .  \0'7i .  and  ‘207(  assuming  a  parallel  job  demand  (J7)  e<|nal  to  1000  units,  and  an  owner 
processes  demand  (Cl)  equal  to  10  units.  Ffrr  a  given  utilization  we  a,s.sunie  all  workstations  have  the 
same  owner  process  utilization.  The  top  curve  is  the  theoretical  optimal  speedup,  i.e.  unitary  linear. 
The  speedup  curves  are  concave  increasing,  i.e.  the  benefit  of  adding  more  nodes  decreases  as  nodes 
are  added,  despite  ignoring  overhead  for  parallelizing  the  program  (synchronizatitm.  rommunication. 
non-balanced  load.  etc).  .At  100  nodes  the  speedup  for  a  system  with  only  I'X  utilization  is  oidy 
of  the  optimal  speedup,  for  a  207  utilization  the  speedup  is  only  '.i2.o7  of  the  o|)timal  speedup. 

To  present  the  efficiency  of  the  system,  i.e.  how  close  to  optimal  speedu[>s  are  achieved,  we  |)lot 
effiriency  versus  number  of  nodes  in  figure  2. 

In  both  of  the  preceding  plots  we  compare  the  performance  of  the  parallel  program  exec  uted  on 
a  ■-ystem  of  workstations  with  a  given  owner  utilization  to  that  of  the  same  program  executecl  cui  a 
single  node  with  no  owner  utilization.  To  focus  on  the  how  effective  distributed  romputing  utilizes 
wasted  cycles  we  consider  the  weighted-speedup  and  weighted-efficienry  metrics.  In  figures  .'{  and  1  we 
[)lot  weighted-speedup  and  weighted-efficienry  versus  the  number  of  nodes  for  thi'  same  parameters  as 
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in  figures  I  and  2.  Note  the  weighted-efficiency  is  still  only  61.5%  (41%)  for  a  utilization  of  1%  (20% ). 
Hence,  even  once  owner  utilization  is  taken  into  consideration  achieved  performance  is  significantly 
worse  than  optimal. 

One  cause  for  the  degradation  of  performance  is  that  the  probability  of  one  of  the  workstations 
experiencing  a  transient  period  of  high  utilization  increases  as  the  number  of  nodes  increases.  Since 
the  parallel  job  must  wait  for  each  task  to  complete  execution,  just  one  workstation  experiencing  a 
transient  high  utilization  will  slow  down  the  entire  computation,  hence  performance  degrades  as  the 
number  of  workstations  increases. 

.A  second  more  subtle  cause  of  performance  degradation  results  from  a  decrease  in  the  ratio  of 
parallel  task  time  to  owner  process  task  time  {task  ratio).  To  demonstrate  this  effect  consider  what 
happens  if  we  increase  the  parallel  job  demand  from  IK  units  to  lOK  units.  In  figure  5  and  6  we  plot 
the  weighted-speedup  and  weighted-efficiencies  for  the  same  experiment  as  in  figures  3  and  4.  except 
jol)  demand  equals  lOK.  The  weighted-speedups  and  weighted-efficiencies  for  a  job  demand  of  lOK 
units  are  much  higher  than  their  counterparts  in  figure  3  and  4.  For  J  equal  to  lOK,  T  equals  100 
units  for  a  100  workstation  system,  whereas  J  equal  to  IK  results  in  a  T  equal  to  10  units  for  a 
100  workstation  system.  Tasks  of  demand  10  units  experience  a  proportionally  larger  delay  by  owner 
processes  than  tasks  requiring  100  units. 

To  more  clearly  illustrate  the  point,  we  plot  weighted-efficiency  versus  the  task  ratio  for  a  system 
with  60  workstations  in  figure  7.  (The  plot  for  weighted-speedups  is  identical  except  the  y-axis  is 
scaled  from  0  to  60  instead  of  0  to  1.)  From  the  figure  we  conclude  that  in  order  to  achieve  acceptable 
efficiency’s,  and  thus  good  speedups.  we  must  ensure  that  the  parallel  task  demand  is  sufficiently  large 
relative  to  the  average  demand  of  owner  processes,  i.  e.  we  must  ensure  a  large  task  ratio. 

In  the  previous  experiment  we  fixed  the  number  of  workstations  equal  to  60.  In  figure  X  we 
plot  the  weighted-efficiency  versus  task  ratio  for  various  system  sizes  for  an  owner  utilization  of  10%. 
Sensitivity  to  the  task  ratio  increases  with  system  size. 

One  of  the  main  conclusions  from  these  experiments  is  that  in  order  to  achieve  good  speedups  for 
fixed  size  problems,  it  is  essential  that  the  task  ratio  be  sufficiently  large.  Similar  to  the  computation 
to  communication  ratio  being  an  important  consideration  for  parallel  computations,  the  task  ratio  is 
an  important  factor  in  non-dedicated  distributed  computing. 

3.2  Scaled  Problem  Size 

We  now  consider  the  effect  of  scaling  the  problem  size  with  the  number  of  nodes.  We  assume  job 
demand  scales  linearly  with  the  number  of  workstations.  This  type  of  scaling  has  been  railed  tuniiory- 
fiotiurfcrf  sraleup  [10].  With  memory-bounded  sraleup  and  perfect  parallelism,  ideally,  we  may  be  able 
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to  roiiipiptp  VV  tiiiiPS  the  ainouiit  of  work  in  th<‘  same  time  as  the  original  problem  on  a  single 
workstation  by  using  a  system  with  W  nodes  [12].  In  figure  9  we  plot  job  execution  time  versus  the 
number  of  workstations  assuming  job  demand  is  eoual  to  100  units  times  the  number  of  workstations. 
Since  the  problem  size  scales,  the  parallel  task  demand  is  a  constant  100  units,  and  hence,  the  task 
ratio  is  fixed  at  10.  Initially  there  is  a  sharp  increase  in  response  time  as  system  size  increases,  but 
the  increase  diminishes  as  system  size  becomes  large.  For  .system  utilizations  of  1.  10.  and  20^.  the 

response  time  for  a  problem  using  100  workstations  increases  by  I  I.  dO.  t  l.  and  71‘X  relative  to  the 
res|)onse  time  for  a  problem  using  one  workstation  with  the  .same  owner  utilization.  In  other  words, 
the  distrib\jted  romi)uting  approach  offers  the  potential  to  increase  the  |)roblem  size  by  a  factor  of 
100  and  only  increase  response  time  by  ll'X  assuming  all  workstations  liavj-  a  utilization  t)f  \{Y'/, . 

Memory-bounded  scaleup  exhibits  better  performance  than  fixed-size  computing  since  the  task 
ratio  is  fixed,  while  tin-  task  ratio  in  fixed-size  computing  decreases  with  an  increase  in  the  number  of 
workstations.  We  also  considered  larger  job  demands  an<l  found  the  increase  in  response  time  to  be 
even  less.  Hence,  we  conclude  that  the  distributed  computing  approach  olfers  significant  potential  for 
scaling  of  |)roblems  even  if  workstation  owner  processes  are  granted  preemptive  priority  ovi-r  parallel 
tasks. 


4  Experimental  Validation 

In  this  section  we  present  preliminary  results  from  experimental  stiidii's  to  validate  the  analysis.  In 
these  initial  studies  we  focus  only  on  fixed  size  [)roblems.  We  liave  rhosrn  to  implenieiit  our  parallel 
program  using  the  F\'.\I  package.  We  chose  the  P\'M  package  based  on  the  package  bi’ing  well  known 
and  highly  available.  We  made  no  attempt  to  compare  the  PVM  package  with  any  othi'r  distributed 
com  put  at  ion  packages. 

lo  isolate  the  effects  of  workstation  owner  interference  we  assume  the  parallel  program  is  a 
hxal  computation  problem  [llj.  That  is.  the  (rroblem  has  perfect  parallelism  and  no  interprocess 
communication.  The  parallel  program  forks  H'  (larallel  tasks,  one  for  each  workstation  in  the  system, 
and  each  task  executes  inde|)endently.  F.ach  parallel  task  is  "nired"  (runs  at  lovs  priority)  granting 
workstation  owner  processes  preemi)tive  priority  over  the  parallel  tasks. 

Our  primary  metrics  are  maximum  task  exr'ciition  time  and  speedup  I  he  most  coniinon  metric 
for  a  study  such  as  this  is  job  response  time.  i.  tin'  time  from  the  parallel  job  is  started  until  it 
completes.  I  bis  metric  is  influenced  by  theicverhead  (cf  the  paralhd  (omputing  pac  kage  her  initi. cling 
the  [crocc’sscs  anci  collecting  the-  results.  We  v^ant  to  focus  only  on  the*  interference'  of  w c uksl ,il ion 
owner  (croccssc's  and  thus  rejecteci  defining  response  time  III  this  stanclard  way.  Instead,  we  foc  us  on 
t  he  maximum  t  ask  c-xec  lit  ion  t  i  me.  I'liis  t  ime  was  obtained  by  having  eac  h  task  record  t  he  system  time 
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when  it  started  computation  and  noting  the  system  time  immediately  when  completing  computation. 
Each  of  the  parallel  tasks  then  return  their  task  execution  time  to  the  master  process  which  selects 
and  reports  the  maximum.  By  considering  the  maximum  task  execution  time  we  isolate  the  impact 
of  workstation  owner  process  interference. 

We  report  the  results  from  one  experiment.  Further  experiments  are  currently  being  conducted. 
The  system  studied  is  composed  of  at  most  12  Sun  EEC  Sparcstations.  VVe  varied  the  number  of 
workstations  from  1  to  12,  first  ensuring  that  none  of  the  workstations  are  executing  long  running 
jobs.  In  general  the  only  interference  is  from  more  trivial  usage  such  as  editing  files,  reading  mail, 
news,  etc.  For  each  number  of  workstations  considered  we  ran  the  parallel  program  10  times  for 
each  parameter  value  and  calculated  the  mean  of  these  10  runs  as  our  metric.  Cliven  the  number 
of  workstations,  the  input  parameter  to  our  parallel  program  is  the  problem  size.  VVe  consider  five 
different  problem  sizes:  1,2, 4, 8,  and  16  minutes  are  the  service  demands  of  these  problems  on  a  single 
dedicated  machine.  .N'o  attempt  has  yet  been  made  to  provide  confidence  intervals  or  more  detailed 
statistical  analysis. 

If  figure  10  we  plot  the  maximum  task  execution  time  versus  the  number  of  workstations  for  the 
five  different  job  demands  assuming  a  fixed  problem  size.  The  solid  lines  are  the  measured  values 
from  our  experiment.  The  dashed  linos  are  predictions  from  our  analytical  motlel  where  the  input 
parameter  for  workstation  owner  utilization  is  set  to  4'T.  We  obtained  the  .'Iff  value  by  comimting 
the  mean  of  the  machine  utilizations  (by  using  the  unix  uptime  command)  over  two  working  days 
when  no  PV'.M  programs  were  executing.  The  model.„  (|ualitative  and  (piantitative  predictions  are  in 
close  agreement  with  the  measured  results. 

In  figure  1  1  we  plot  the  speedup  versus  the  mimber  of  workstations.  The  values  plottiul  were 
obtained  from  measurement  of  the  system.  In  this  ca.se  we  define  speedup  as  the  ratio  of  the  max¬ 
imum  task  execution  time  using  one  workstation  over  the  maximum  task  exenition  time  using  H' 
workstatif)ns.  The  utilization  of  the  maihines  is  very  low  ami  thus  there  is  not  significant  degrada¬ 
tion  of  parallel  program  performance.  In  a  more  heavily  loaded  system  we  would  expert  muc  h  more 
degradation.  Focusing  on  the  8  and  12  workstation  cases  we  see  that  the  speculup  decreases  as  the  job 
demand  decreases,  i.e.  the  speedup  for  a  job  demand  of  I  is  lower  than  the  speedup  for  a  job  demand 
of  16.  This  is  because  the  task  ratio  is  smaller  for  a  job  clemand  ccf  1  than  it  is  for  a  job  demand  cjf 
16.  This  experiment  thus  (pialiiativc’ly  valiclates  the  analysis.  Note  that  the-  analysis  shows  a  more 
sigidficant  drejp  in  speedup  as  system  size  increases.  Cnforicmatcdy  we  oidy  have  12  homogeneous 
workstations  with  which  to  validate  our  results  ami  hence*  ran  ncil  experimentally  validate  this  rc'sidt. 
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5  Conclusions  and  Discussion 

In  this  paper  we  have  developed  an  abstract  tiiodel  of  a  distributed  roiiipntitip,  system  to  determine 
the  feasibility  of  using  ilistributed  computing  iti  a  non-dedicated  system  assuming  workstation  owner 
processes  have  jcreemptive  priority  over  |)arallel  tasks.  The  model  is  an  abstraction  of  a  parallel 
[irogram  igtioring  comimiiiit ation  and  sy nchrotiization  overheads.  VVe  assume  the  targi'ti'd  parallel 
programs  e.veciite  eliiciently  on  a  dedicated  distributed  system,  hi'iice  we  cati  ignore  these  overheads 
and  focus  on  the  impact  of  a  tioti-dedicated  etivirontiient .  1  h<‘  |iiirpose  of  consideritig  a  non-tb'dicated 
system  is  to  determine  if  idle  (wasted  cycles)  workstations  ran  be  utilized  to  recince  ('xeciition  time 
and  to  solve  large  |)r(d)lems. 

for  fi.xed  size  probh'Mis  we  hilve  found  that  good  s|)eednps  can  Ix'  achieved,  but  only  il  the 
amoutit  of  work  allocated  to  eac  h  mtirhilie  is  siitficic'tit ly  large  com()arecl  to  the  me;in  service  demand 
Ilf  workstation  |irorc'sses.  Hence,  for  non  clediratc'd  svstcmis  where  the  workstation  owner  procc’sses 
have  (ireemptive  priority  civer  parallel  tasks,  the-  parallel  task  chmianci  to  owner  task  dcmmiid  ratio 
I  tc/s<-  cfj/io)  is  a  delerminitig  factor  iti  performance  of  t he  parallel  program.  In  part icnlar.  wc  tind  t  hat 
the  task  ratio  should  be  at  least  s  for  a  parallel  joi>  to  achieve'  vO  pc'rcc'nt  of  the  [lossible  s|iec>dnp.  even 
adjusting  for  system  utilization,  for  a  system  in  which  c'ach  homogeneous  workstation  has  a  utilization 
of percent ,  In  acldition.  the  task  ratio  neecb'cl  to  achic'vc'  s()  percent  oft  hc'  possible’  speed  up  increases 
with  svstc'm  utilization.  .\t  a  ntiliziitic  i  of  10  pc-rcc-nt  the  task  ratio  must  be  Id  or  higher,  anci  at  a 
utilization  of  20  percent  the  task  ratio  must  be  20  or  greater. 

I  lie  model  proposed  in  this  |)a[i<'r  assiitiies  local  worksfafiofi  procc's.se.s  have  detc'rministic  service 
reipiireiiieiiis.  riiis  assiimption  implies  that  results  presc-nted  in  this  papc'r  is  cceiservative.  Hence, 
'".eii  larger  task  ratios  are  likely  to  be  necessarv  to  achieve  good  performatice.  Tims,  btisc'd  on  onr 
'tndv.  distributed  computing  iti  ;i  non  cleclicatc'd  environment  where' workst at ioti  owne'r  proec'sses  have' 
prc'e'inpl ivi'  priority  over  paralh'l  t.isks  is  a  viable  approach  only  if  the'  task  raliei  is  siifficic'iit ly  large'. 
I  hc'  I'xac  (  size'  of  the'  ratio  nc'c'clc'd  is  both  ap|)liration  anel  environtiient  de'pc’nclc’iit . 

lor  'calecl  probh'ms  nneb'r  a  non  cle'dicatc'd  c'nvironme'iit .  we  have  foniiel  that  distributed  compiit 
iru!  otfi'rs  significant  pofi'nfial  for  f he  e'llicie'nt  e'xee  iition  of  scalc'd  probh'ms.  In  |)articnlar,  assuming 
each  workstation  in  the'  sv'le'iii  has  a  ntilizaticin  of  ."i  jee'ree'nt  (20  pe'rcc'iit  ).  mc'an  job  response  time' 
is  onlv  im  reasC'd  bv  20  pc'rcc'nt  (71  pc'rcc'ipi  whc'ii  comparing  the'  re'sponsc'  time'  of  a  scah'cl  prob 
hill  Using  1(1(1  'Workstations  rc'lative'  to  that  cif  probh'm  nsin<g  one'  workstalion  with  a  ."i  pe'ccc'iit  (2(1 
lar'enti  'll  diza  I  ion .  I  he  |)i'rformanc  c'  difb'renee'  be'lwe'e-n  hxe'el  size'  ancI  scaled  prcddc'iiis  is  cine-  to 
tie  f.e  t  ili.ct  1  lie'  task  ratio  of  St  ;d<'c|  probh'ms  is  fixc'cl,  whih'  the'  task  ratio  of  fixc'il  size'  prcebh'iiis 
i|ee  ri'a-e,  as  the'  nniiiber  eef  weirkslalion  inc  ri'asc's,  .Note  that  the'  re'siilts  arc'  basi'cl  on  teiir  Idc'.dizi'd 
a  -  -  '1111  pt  ion  anel  In'iic  c'  are'  opt  imisi  ic.  1  he'  act  n. cl  re's|)onse'  t  ime'  of  I  hc'sc'  probh'ms  we  ml  el  In'  ch'pendc'nt 
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on  communication  bandwidth  requirements  which  are  ignored  in  our  model. 

We  assume  the  workload  of  the  non-dedicated  environment  is  light  and  the  effect  of  long  running 
workstation  owner  jobs  is  not  considered.  How  to  provide  reasonable  execution  times  for  parallel  jobs 
in  a  non-dedicated  system  with  long  running  workstation  owner  jobs  must  be  solved  if  distributed 
computing  is  to  be  feasible  in  a  non-dedicated  environment.  Currently  our  model  only  provides  some 
initial  insights  into  the  general  problem  of  distributed  computing  in  a  non-dedicated  system.  In  the 
future  we  intend  to  extend  the  model  to  handle  more  complex  workloads.  In  addition,  we  are  currently 
pursuing  further  experimental  validation  of  our  model. 


References 

[1]  Beguelin.  A..  Dongarra,  J.J.,  Geist,  G..A.,  Mancheck,  R.,  and  Sunderam.  V'.S..  "A  Users'  Guide 
to  P\'.\l  ParaUel  Virtual  .Machine,"  Technical  Report  ORXL/TM- 1 1826,  Oak  Ridge  .National 
Laboratory.  .July  1991. 

[2]  Beguelin,  .A.,  Dongarra.  .1..1.,  Geist,  G.A..  Mancheck,  R.,  .Moore.  K..  and  Sunderam.  V.S..  "Tools 
for  Heterogeneous  Network  Computing”.  Proc.  6th  SIAM  conf.  on  Parallel  Processing  For  Sci¬ 
entific  Computing,  Vol  2.  .March  1993. 

[3]  G.  Fox  and  et.  al..  Sohing  Problems  on  Concurrent  Processors.  Prentice-Hall  Inc..  1988. 

’ll  Kohayashi.  Modeling  and  A tia/j/st's,  Addison- Wesley,  1978. 

’■)]  (ieist.  and  Sunderam,  V.S,,  “Experiences  With  Network  Based  Concurrent  Computing 

on  the  PV.M  System",  Technical  Report  ORNL/T.M-1 1760.  Oak  Ridge  .National  Laboratory. 
January  1991. 

[6]  J.L.  Gustafson  and  G.R.  Montry  and  R.E.  Benner,  “Development  of  Parallel  Methods  for  a 
i024-proressor  Hypercube",  SIAM  J.  on  SSTC,  Vol.  9,  .No.  4,  1988. 

[7]  Sauer.  C.H..  Chandy,  K..M..  Computer  System  Performance  Modeling,  Prentice-Hall.  1981,  page 
16. 

[8]  Schwetman,  H.D..  “CSIM:  A  C-Based  Process-Oriented  Simulation  Language”,  Proc.  of  the  1986 
Winter  Simulation  Conference,  December,  1986. 

[9]  Sunderam.  V.S.,  "PV.M:  A  Framework  for  Parallel  Distributed  Computing”,  Concurrency;  Prac¬ 
tice  and  Experience,  Vol.  2,  No.  4,  December  1990. 

[10]  Xian-He  Sun  and  L.  .Ni.  “Another  View  on  Parallel  Speedup”,  Proc.  of  Supercomputing  "90.  Nov. 
1990. 

[11]  Xian-He  Sun  and  L.  .Ni,  “A  Structured  Representation  for  Parallel  Algorithm  Design  on  Multi- 
computers”,  Proc.  of  the  Sixth  Conf.  on  Distributed  Memory  Computing,  April,  1991. 

[12]  Xian-He  Sun  and  L.  Ni,  “Scalable  Problems  and  .Memory- Bounded  Speedup",  J.  of  Parallel  and 
Distributed  Computing,  Vol.  19,  Sept.  1993. 


11 


i 


•  •  •  •  • 


( 


Ef ficiency 


Weighted  Speedup 


Weighted  Efficiency 


Weighted  Speedup 


•  • 


•  •••••• 


Figure  5:  Weighted  Speedup,  J  =  10,000  units 
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Figure  7;  Effect  of  Task  Ratio,  60  Workstations 
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Figure  8:  Effect  of  Task  Ratio,  Number  Workstations  Varied,  Owner  Utilization 
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Figure  11:  Experimental  Validation:  Speedups 
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